1,825 research outputs found

    Feature Selection via Coalitional Game Theory

    Get PDF
    We present and study the contribution-selection algorithm (CSA), a novel algorithm for feature selection. The algorithm is based on the multiperturbation shapley analysis (MSA), a framework that relies on game theory to estimate usefulness. The algorithm iteratively estimates the usefulness of features and selects them accordingly, using either forward selection or backward elimination. It can optimize various performance measures over unseen data such as accuracy, balanced error rate, and area under receiver-operator-characteristic curve. Empirical comparison with several other existing feature selection methods shows that the backward elimination variant of CSA leads to the most accurate classification results on an array of data sets

    "We do not appreciate being experimented on": Developer and Researcher Views on the Ethics of Experiments on Open-Source Projects

    Full text link
    A tenet of open source software development is to accept contributions from users-developers (typically after appropriate vetting). But should this also include interventions done as part of research on open source development? Following an incident in which buggy code was submitted to the Linux kernel to see whether it would be caught, we conduct a survey among open source developers and empirical software engineering researchers to see what behaviors they think are acceptable. This covers two main issues: the use of publicly accessible information, and conducting active experimentation. The survey had 224 respondents. The results indicate that open-source developers are largely open to research, provided it is done transparently. In other words, many would agree to experiments on open-source projects if the subjects were notified and provided informed consent, and in special cases also if only the project leaders agree. While researchers generally hold similar opinions, they sometimes fail to appreciate certain nuances that are important to developers. Examples include observing license restrictions on publishing open-source code and safeguarding the code. Conversely, researchers seem to be more concerned than developers about privacy issues. Based on these results, it is recommended that open source repositories and projects address use for research in their access guidelines, and that researchers take care to ask permission also when not formally required to do so. We note too that the open source community wants to be heard, so professional societies and IRBs should consult with them when formulating ethics codes.Comment: 15 pages with 42 charts and 3 tables; accepted versio

    Shedding light on a living lab: the CLEF NEWSREEL open recommendation platform

    Get PDF
    In the CLEF NEWSREEL lab, participants are invited to evaluate news recommendation techniques in real-time by providing news recommendations to actual users that visit commercial news portals to satisfy their information needs. A central role within this lab is the communication between participants and the users. This is enabled by The Open Recommendation Platform (ORP), a web-based platform which distributes users' impressions of news articles to the participants and returns their recommendations to the readers. In this demo, we illustrate the platform and show how requests are handled to provide relevant news articles in real-time

    Random Filters for Compressive Sampling and Reconstruction

    Get PDF
    We propose and study a new technique for efficiently acquiring and reconstructing signals based on convolution with a fixed FIR filter having random taps. The method is designed for sparse and compressible signals, i.e., ones that are well approximated by a short linear combination of vectors from an orthonormal basis. Signal reconstruction involves a non-linear Orthogonal Matching Pursuit algorithm that we implement efficiently by exploiting the nonadaptive, time-invariant structure of the measurement process. While simpler and more efficient than other random acquisition techniques like Compressed Sensing, random filtering is sufficiently generic to summarize many types of compressible signals and generalizes to streaming and continuous-time signals. Extensive numerical experiments demonstrate its efficacy for acquiring and reconstructing signals sparse in the time, frequency, and wavelet domains, as well as piecewise smooth signals and Poisson processes

    Brown representability for space-valued functors

    Full text link
    In this paper we prove two theorems which resemble the classical cohomological and homological Brown representability theorems. The main difference is that our results classify small contravariant functors from spaces to spaces up to weak equivalence of functors. In more detail, we show that every small contravariant functor from spaces to spaces which takes coproducts to products up to homotopy and takes homotopy pushouts to homotopy pullbacks is naturally weekly equivalent to a representable functor. The second representability theorem states: every contravariant continuous functor from the category of finite simplicial sets to simplicial sets taking homotopy pushouts to homotopy pullbacks is equivalent to the restriction of a representable functor. This theorem may be considered as a contravariant analog of Goodwillie's classification of linear functors.Comment: 19 pages, final version, accepted by the Israel Journal of Mathematic

    When Are Names Similar Or the Same? Introducing the Code Names Matcher Library

    Full text link
    Program code contains functions, variables, and data structures that are represented by names. To promote human understanding, these names should describe the role and use of the code elements they represent. But the names given by developers show high variability, reflecting the tastes of each developer, with different words used for the same meaning or the same words used for different meanings. This makes comparing names hard. A precise comparison should be based on matching identical words, but also take into account possible variations on the words (including spelling and typing errors), reordering of the words, matching between synonyms, and so on. To facilitate this we developed a library of comparison functions specifically targeted to comparing names in code. The different functions calculate the similarity between names in different ways, so a researcher can choose the one appropriate for his specific needs. All of them share an attempt to reflect human perceptions of similarity, at the possible expense of lexical matching.Comment: 20 pages. Download from https://pypi.org/project/namecompare

    Comparing the Efficacy of Drug Regimens for Pulmonary Tuberculosis: Meta-analysis of Endpoints in Early-Phase Clinical Trials

    Get PDF
    Background A systematic review of early clinical outcomes in tuberculosis was undertaken to determine ranking of efficacy of drugs and combinations, define variability of these measures on different endpoints, and to establish the relationships between them. Methods Studies were identified by searching PubMed, Medline, Embase, LILACS (Latin American and Caribbean Health Sciences Literature), and reference lists of included studies. Outcomes were early bactericidal activity results over 2, 7, and 14 days, and the proportion of patients with negative culture at 8 weeks. Results One hundred thirty-three trials reporting phase 2A (early bactericidal activity) and phase 2B (culture conversion at 2 months) outcomes were identified. Only 9 drug combinations were assessed on >1 phase 2A endpoint and only 3 were assessed in both phase 2A and 2B trials. Conclusions The existing evidence base supporting phase 2 methodology in tuberculosis is highly incomplete. In future, a broader range of drugs and combinations should be more consistently studied across a greater range of phase 2 endpoints
    corecore